Selective use of gaze information to improve ASR performance in noisy environments by cache-based class language model adaptation
نویسندگان
چکیده
Using information from a person’s gaze has potential to improve ASR performance in acoustically noisy environments. However, previous work has resulted in relatively minor improvements. A cache-based language model adaptation framework is presented where the cache contains a sequence of gaze events, classes represent visual context and task, and the relative importance of gaze events is considered. An implementation in a full ASR system is described and evaluated on a set of gaze-speech data recorded in both a quiet and acoustically noisy environment. Results demonstrate that selectively using gaze events based on measured characteristics significantly increases the performance improvements in WER on speech recorded in the noisy environment from 6.34% to 10.58%. This work highlights: the need to selectively use information from gaze, to constrain the redistribution of probability mass between words during adaptation via classes, and to evaluate the system with gaze and speech collected in environments that represent the real-world utility.
منابع مشابه
Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings
Previous use of gaze (eye movement) to improve ASR performance involves shifting language model probability mass towards the subset of the vocabulary whose words are related to a person’s visual attention. Motivated to improve Automatic Speech Recognition (ASR) performance in acoustically noisy settings by using information from gaze selectively, we propose a ‘Selective Gaze-contingent ASR’ (SG...
متن کاملThe selective use of gaze in automatic speech recognition
The performance of automatic speech recognition (ASR) degrades significantly in natural environments compared to in laboratory assessments. Being a major source of interference, acoustic noise affects speech intelligibility during the ASR process. There are two main problems caused by the acoustic noise. The first is the speech signal contamination. The second is the speakers’ vocal and non-voc...
متن کاملLanguage Model Adaptation Using Dirichlet Class Language Model Based on Part-of-Speech
Language modeling has many applications in a large variety of domains. Performance of this model depends on its adaptation to a particular style of data. Accordingly, adaptation methods endeavour to apply syntactic and semantic characteristics of the language for language modeling. The previous adaptation methods such as family of Dirichlet class language model (DCLM) extract class of history w...
متن کاملInstance-based on-line language model adaptation
Language model (LM) adaptation is needed to improve the performance of language-based interaction systems. There are two important issues regarding LM adaptation; the selection of the target data set and the mathematical adaptation model. In the literature, usually statistics are drawn from the target data set (e.g. cache model) to augment (e.g. linearly) background statistical language models,...
متن کاملBuilding an ASR system for noisy environments: SRI's 2001 SPINE evaluation system
We describe SRI’s recognition system as used in the 2001 DARPA Speech in Noisy Environments (SPINE) evaluation. The SPINE task involves recognition of speech in simulated military environments. The task had some unique challenges, including segmentation of foreground speech from noisy background, the need for robust acoustic models to handle noisy speech, and development of language models from...
متن کامل